Skip to content

[state sync] migrate to new bucket format#25925

Merged
phoenix-o merged 1 commit intomainfrom
state_sync_alt0
Mar 26, 2026
Merged

[state sync] migrate to new bucket format#25925
phoenix-o merged 1 commit intomainfrom
state_sync_alt0

Conversation

@phoenix-o
Copy link
Contributor

@phoenix-o phoenix-o commented Mar 20, 2026

Description

migrates state sync to the new bucket format


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • Indexing Framework:

@vercel
Copy link

vercel bot commented Mar 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sui-docs Ready Ready Preview, Comment Mar 25, 2026 3:52pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
multisig-toolkit Ignored Ignored Preview Mar 25, 2026 3:52pm
sui-kiosk Ignored Ignored Preview Mar 25, 2026 3:52pm

Request Review

@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 18:29 — with GitHub Actions Inactive
@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 18:31 — with GitHub Actions Inactive
@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 18:55 — with GitHub Actions Inactive
@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 19:03 — with GitHub Actions Inactive
@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 19:49 — with GitHub Actions Inactive
@phoenix-o phoenix-o marked this pull request as ready for review March 20, 2026 20:11
@phoenix-o phoenix-o requested a review from a team as a code owner March 20, 2026 20:11
@phoenix-o phoenix-o temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2026 20:11 — with GitHub Actions Inactive
@phoenix-o phoenix-o requested a review from bmwill March 20, 2026 20:11
Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phoenix-o, could you motivate your use of the indexing framework here, as well as some of the changes you need to make to it?

  • Why does state sync need those changes, and are those needs general to all users of the indexing framework or specific to state sync?
  • How much of the indexing framework are you using?

It's possible/likely that the best option is to not use the indexing framework, but implement a small/custom interface over object_store instead that caters exactly to the node software's needs.

@phoenix-o
Copy link
Contributor Author

@amnn: State sync currently uses the old ingestion framework. This PR migrates it(last callsite) and utilizes the new checkpoint format.
Only IngestionClient is used to download the checkpoints.
There are not many changes to the framework itself.
A new optional parameter is added remote_store_extra_settings. It's empty by default, so it does not change the behavior of any existing clients.
This allows dynamically passing extra settings to the underlying object store library for more precise configuration.
For instance here StateSync needs to pass down credentials for requester-pays behavior.
I only need this for AWS at the moment, but I updated GCP and Azure as well (can revert if needed)

@amnn
Copy link
Contributor

amnn commented Mar 23, 2026

Only IngestionClient is used to download the checkpoints.

If only IngestionClient is used and even within that you only need to be able to fetch checkpoints from an object store, then I'm still leaning towards not using the indexing framework for this, but just implementing a thin wrapper around object_store, and there's a lot hanging off that dependency that you do not need/use. I am also thinking of the fact that we want to move the indexing framework out of the mono-repo at some point and publish it to crates.io.

For instance here StateSync needs to pass down credentials for requester-pays behavior.

Separately, I think this issue has come up before, so maybe it is worth solving in the framework, even if state sync doesn't use it:

  • @nickvikeras I recall you needed to solve this problem for the analytics indexer, and you were able to do this without updating the framework -- could you explain what you did (and whether you have a path towards generalising it in the framework?)
  • @rushrs what does the set-up look like for our indexers -- do they use requester pays buckets and if so how do they get configured? I believe we expose configuration via environment variables for this kind of thing, but I'm not sure we use it.

@phoenix-o
Copy link
Contributor Author

@amnn: sounds good, I’ll revert the changes to IngestionClientArgs. Just thought it might be helpful for more precise future configuration of underlying object store clients

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phoenix-o, this is now mostly outside of my domain, so will leave the rest of the review to @bmwill or @nickvikeras

sui-simulator.workspace = true
sui-protocol-config.workspace = true
sui-data-ingestion-core.workspace = true
sui-indexer-alt-framework.workspace = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to get rid of this now, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused at the state of the PR. It looks like Ashok's preference is to move off of the indexing framework's ingestion client and use object_store directly, but we are still using the ingest client in this revision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’m still using the client. Only reverted the changes to the client itself (i.e. I construct object store instance outside)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, why use the ingestion client? You are pulling in all the code for building indexers (and which will one day not even be in the mono-repo) but all you need is the naming scheme for fetching a checkpoint, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point is to minimize code duplication. We’ll need the same functionality in a couple of other places (i.e. sui-tool). Maybe it should live in some other common place. But ok, removing the sui-indexer-alt-framework dependency from this PR

@amnn amnn requested review from amnn and removed request for amnn March 24, 2026 11:25
@nickvikeras
Copy link
Contributor

Only IngestionClient is used to download the checkpoints.

If only IngestionClient is used and even within that you only need to be able to fetch checkpoints from an object store, then I'm still leaning towards not using the indexing framework for this, but just implementing a thin wrapper around object_store, and there's a lot hanging off that dependency that you do not need/use. I am also thinking of the fact that we want to move the indexing framework out of the mono-repo at some point and publish it to crates.io.

For instance here StateSync needs to pass down credentials for requester-pays behavior.

Separately, I think this issue has come up before, so maybe it is worth solving in the framework, even if state sync doesn't use it:

* @nickvikeras I recall you needed to solve this problem for the analytics indexer, and you were able to do this without updating the framework -- could you explain what you did (and whether you have a path towards generalising it in the framework?)

* @rushrs what does the set-up look like for our indexers -- do they use requester pays buckets and if so how do they get configured? I believe we expose configuration via environment variables for this kind of thing, but I'm not sure we use it.

This was for reading out of the parquet bucket, not the checkpoint bucket, but the code is here -

// Apply custom headers (e.g., for requester-pays buckets)
if let Some(headers_map) = custom_headers {
let mut headers = HeaderMap::new();
for (key, value) in headers_map {
headers.insert(
HeaderName::try_from(key.as_str())?,
HeaderValue::from_str(value)?,
);
}
client_options = client_options.with_default_headers(headers);
}
GoogleCloudStorageBuilder::new()
.with_client_options(client_options)
.with_bucket_name(bucket)
.with_service_account_path(service_account_path.to_string_lossy())
.build()
.map(|s| Arc::new(s) as Arc<dyn object_store::ObjectStore>)
.context("Failed to create GCS store")

I think we want to do something similar in the indexing framework so we can enable requester pays on our public checkpoint buckets.

remote_store_options: Vec<(String, String)>,
) -> Arc<dyn ObjectStore> {
let timeout_secs = 5;
let client_options = ClientOptions::new()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add .with_default_headers() to set the requester pays headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t think so. The current sui-data-ingestion-core client doesn’t set it and works with requester-pays buckets. But once accepted, I’ll state-sync the mainnet FN with the new binary and verify that it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, spun up FN locally. It syncs from the S3 requester-pays bucket

@phoenix-o phoenix-o merged commit 2533f2e into main Mar 26, 2026
64 of 65 checks passed
@phoenix-o phoenix-o deleted the state_sync_alt0 branch March 26, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants